Parsing Strings with split

Download examples from stest.zip

When we have a situation where strings contain multiple pieces of information (for example, when reading in data from a file on a line-by-line basis), then we will need to parse (i.e., divide up) the string to extract the individual pieces.

The following jargon is often used
token
one piece of information, a “word”
delimiter
one (or more) characters used to separate tokens
parsing
dividing a string into tokens based on the given delimiters

Strings in Java can be parsed using the split method of the String class.  This page gives a brief overview (and some examples) of some of the common (and easiest) ways to use the split method.

For more detailed information see the Java API documentation for split

Example 1

We want to divide up a phrase into words where spaces are used to separate words. For example

the music made   it   hard      to        concentrate

In this case, we have just one delimiter (space) and consecutive delimiters (i.e., several spaces in a row) should be treated as one delimiter. To parse this string in Java, we do

String phrase = "the music made   it   hard      to        concentrate";
String delims = "[ ]+";
String[] tokens = phrase.split(delims);

Note that

See full code stest1.java

Example 2

Suppose each string contains an employee's last name, first name, employee ID#, and the number of hours worked for each day of the week, separated by commas. So

Smith,Katie,3014,,8.25,6.5,,,10.75,8.5
represents an employee named Katie Smith, whose ID was 3014, and who worked 8.25 hours on Monday, 6.5 hours on Tuesday, 10.75 hours on Friday, and 8.5 hours on Saturday. In this case, we have just one delimiter (comma) and consecutive delimiters (i.e., more than one comma in a row) should not be treated as one.  To parse this string, we do
String employee = "Smith,Katie,3014,,8.25,6.5,,,10.75,8.5";
String delims = "[,]";
String[] tokens = employee.split(delims);

After this code executes, the tokens array will contain ten strings (note the empty strings): "Smith", "Katie", "3014", "", "8.25", "6.5", "", "", "10.75", "8.5"

There is one small wrinkle to be aware of (regardless of how consecutive delimiters are handled): if the string starts with one (or more) delimiters, then the first token will be the empty string ("").

See full code stest2.java

Example 3

Suppose we have a string containing several English sentences that uses only commas, periods, question marks, and exclamation points as punctuation.  We wish to extract the individual words in the string (excluding the punctuation).  In this situation we have several delimiters (the punctuation marks) and we want to treat consecutive delimiters as one

String str = "This is a sentence.  This is a question, right?  Yes!  It is.";
String delims = "[.,?!]+";
String[] tokens = str.split(delims);


All we had to do was list all the delimiter characters inside the square brackets ( [ ] ).

See full code stest3.java

White Space

To use consecutive blanks as a delimiter we assign delims = "[ ]+";

To use white space as a delimiter, assign delims = "[\\s]+";

Note that \\ is required to preserve the slash character.

A whitespace character: \s is the same as [ \t\n\0x0B\f\r] where

See full code stest4.java

References

derived from parseString. See also code examples


Maintained by John Loomis, last updated 30 January 2018